- 'Everyone' contents is the most (highly) rated content in Content Rating.
- 'Free Installs' are the most preferred compared to Paid installs.
- Number of Free compared to Paid Installs.
- Top ten (10) Categories.
- Top ten (10) Genres.
- Category Concentration.
- Paid Apps (Earnings per Category).
- Relationship between App, Category and Type.
The dataset used for this exploration was web scrapped by from Google PlayStore analytics by Lavanya Gupta in 2018. This was provided by companies like Annie or Sensor Tower, which helps to drive development and App marketing strategies to succeed for many companies.
# remove Apps that are greater than > 250
df_gapps_clean = df_gapps_clean[df_gapps_clean['Price'] < 250 ]
df_gapps_clean.sort_values('Price', ascending=False).head()
| App | Category | Rating | Reviews | Size_MBs | Installs | Type | Price | Content_Rating | Genres | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2281 | Vargo Anesthesia Mega App | MEDICAL | 4.60 | 92 | 32.00 | 1000 | Paid | 79.99 | Everyone | Medical |
| 1407 | LTC AS Legal | MEDICAL | 4.00 | 6 | 1.30 | 100 | Paid | 39.99 | Everyone | Medical |
| 2629 | I am Rich Person | LIFESTYLE | 4.20 | 134 | 1.80 | 1000 | Paid | 37.99 | Everyone | Lifestyle |
| 2481 | A Manual of Acupuncture | MEDICAL | 3.50 | 214 | 68.00 | 1000 | Paid | 33.99 | Everyone | Medical |
| 2463 | PTA Content Master | MEDICAL | 4.20 | 64 | 41.00 | 1000 | Paid | 29.99 | Everyone | Medical |
- 'Everyone' contents is the most (highly) rated content in Content Rating.
- 'Free Installs' are the most preferred compared to Paid installs.
- Top ten (10) Categories.
- Top ten (10) Genres.
content_ratings = df_gapps_clean['Content_Rating'].value_counts()
# plot a pie chart
plt.figure(figsize=(10, 8))
figr = px.pie(labels=content_ratings.index,
values=content_ratings.values,
title="Content Rating",
names=content_ratings.index);
# update traces
figr.update_traces(textposition="outside", textinfo='percent + label');
figr.show()
<Figure size 720x576 with 0 Axes>
paid_vs_free_apps = df_gapps_clean['Type'].value_counts()
# plotting a donut chart
plt.figure(figsize=(10, 8))
fig = px.pie(labels=paid_vs_free_apps.index,
values=paid_vs_free_apps.values,
hole=0.6,
title="Paid Vs Free Apps",
names=paid_vs_free_apps.index);
# update traces
fig.update_traces(textposition="outside", textinfo="percent + label");
fig.show()
<Figure size 720x576 with 0 Axes>
top_10_categories = df_gapps_clean['Category'].value_counts()
# plot bar chart
plt.figure(figsize=(12, 10))
plt.xticks(rotation=30)
plt.xlabel("Top 10 Categories")
plt.ylabel("Count")
plt.title("Top 10 Categories for Apps")
fig = plt.bar(x=top_10_categories.index[:10], height=top_10_categories.values[:10]);
plt.show()
top_10_genres = df_gapps_clean['Genres'].value_counts()
# plot horizontal bar chart
plt.figure(figsize=(10, 8))
plt.yticks(rotation=30)
plt.xlabel("Count")
plt.ylabel("Genres")
plt.title("Top 10 Genres")
plt.barh(top_10_genres[:10].index, top_10_genres[:10].values);
plt.show()
- Number of Free Apps compared to Paid Apps Installs.
- Category Concentration.
- Paid Apps (Earnings per Category).
# I will have to create a dataframe to show this relationship, then groupby Category and sum the number of installs
install_category = df_gapps_clean.groupby("Category").agg({"Installs": pd.Series.sum})
# groupby Category and count the number of apps
numb_category = df_gapps_clean.groupby("Category").agg({"App": pd.Series.count})
# the new dataframe created
df_merger = pd.merge(numb_category, install_category, on="Category", how="inner")
df_merger.head()
| App | Installs | |
|---|---|---|
| Category | ||
| ART_AND_DESIGN | 61 | 114233100 |
| AUTO_AND_VEHICLES | 73 | 53129800 |
| BEAUTY | 42 | 26916200 |
| BOOKS_AND_REFERENCE | 169 | 1665791655 |
| BUSINESS | 262 | 692018120 |
# plot scatter plot
fig = px.scatter(df_merger, # data
x='App', # names
y='Installs',
size="App",
hover_name=df_merger.index,
color="Installs",
title="Category Correlation")
fig.update_layout(xaxis_title="Number Of Apps",
yaxis_title="Installs",
yaxis=dict(type="log"))
fig.show()
# plot box plot
fig = px.box(df_gapps_clean,
x="Type",
y= "Installs",
color = "Type",
notched=True,
title="How many downloads are been Paid for compared to Free installs?",
)
fig.update_layout(xaxis_title="Type",
yaxis_title="Installs",
yaxis=dict(type=("log")))
fig.show()
# plotting
paid_gapps = df_gapps_clean[df_gapps_clean["Type"]== "Paid"]
box = px.box(paid_gapps,
x="Category",
y= "Revenue_Estimate",
title="How much can be earned on Paid Apps per Category?"
)
box.update_layout(xaxis_title="Category",
yaxis_title="Paid App BallPark Revenue",
xaxis= {'categoryorder': 'min ascending'},
yaxis= dict(type=("log")))
box.show()
- Relationship between App, Category and Type.
# free vs paid apps
free_vs_paid = df_gapps_clean.groupby(["Category", "Type"], as_index=False).agg({"App": pd.Series.count})
# plot a clustured bar chart
plt.figure(figsize=(16, 10))
plt.xlabel("Category")
plt.ylabel("App")
plt.title("Relationship between App, Category and Type.")
plt.xticks(rotation=90)
sb.barplot(data=free_vs_paid, x="Category", y="App", hue="Type");
plt.show()